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In astro-ph/0702542, Linder and Miquel seek to criticize the use of Bayesian model selection for 
data analysis and for survey forecasting and design. Their discussion is based on three serious mis- 
understandings of the conceptual underpinnings and application of model-level Bayesian inference, 
which invalidate all their main conclusions. Their paper includes numerous further inaccuracies, 
including an erroneous calculation of the Bayesian Information Criterion. Here we seek to set the 
record straight. 
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I. INTRODUCTION 

In a recent paper, Linder and Miquel [l| have mounted 
a vigorous attack on the use of model selection tech- 
niques in cosmology, particularly with regard to inter- 
preting (forecasting) the outcome of (upcoming) surveys 
and in survey design applications. They instead advocate 
a frequentist parameter-fitting technique. 

Our aim in this short note is to highlight important 
misunderstandings that invalidate all the main conclu- 
sions of their paper. In the process, we give a brief self- 
contained discussion of the model selection framework; 
for more details see e.g. Refs. dHH- In the Appendix 
we highlight some specific inaccuracies in Ref. [lj, many 
of which are consequences of the general misunderstand- 
ings outlined in the main body of this Comment. 



II. WHAT IS BAYESIAN MODEL SELECTION? 

In Bayesian inference, model parameters are taken 
as random variables, because this allows propagation of 
the experimental measurement errors into self-consistent 
probabilistic statements about parameter uncertainties. 

The first step of Bayesian parameter estimation is the 
choice of a model (Mi), which specifies a set of parame- 
ters (0i) to be varied in fitting to the data, along with a 
set of prior probability ranges P(9i\Mi) for those param- 
eters. Given a particular set of data D, the likelihood 
P(D\9i, Mi) is used to update the prior probabilities to 
the posterior 



p{ei\D,Mi. 
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about the state of knowledge on the parameters 0i af- 
ter the arrival of the data. From this one can construct 
'credible intervals', i.e. ranges encompassing say 68% or 
95% of posterior probability for the parameters. 

We remark that Bayesian credible intervals have a pro- 
foundly different meaning from frequentist confidence re- 
gions, where model parameters are not random variables 
but fixed unknown quantities. The fact that the two in- 
tervals are formally equal in the case of a Gaussian like- 
lihood (and fiat priors, in the Bayesian scheme) is trace- 
able to the symmetry between the measured mean and 
the 'true' mean entering the Gaussian distribution. This 
formal equivalence can engender considerable confusion 
as to the different interpretations of the final result (for 
a detailed discussion see Ref. @). 

Bayesian model selection (or comparison) is the exten- 
sion of the parameter estimation framework to include 
multiple models, with different parameter vectors and 
priors. Bayes theorem can be applied again to update a 
prior model probability by the evidence, also known as 
the marginal likelihood of the model, which is the nor- 
malization constant in Eq. ([T]) 



P{D\Mi) = J P{D\$i,Mi)P@i\Mi)d8i 



(2) 



The evidence is the probability of the data given the 
model. Bayes theorem is then used to obtain the proba- 
bility of the model given the data, 



P(Mi\D) ol P{D\Mi) P(Mi) , 



(3) 



The posterior P(9i\D, Mi) contains all the information 



where P(-Mj) is the prior model probability. It is clear 
from the above equations that the evidence is the basis 
of the model comparison and is built upon the parameter 
estimation step. 

All of the above is uncontroversial mathematics, pro- 
viding a consistent and systematic inference system for 
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evolving probabilities in light of experimental data. Any 
controversy about Bayesian methods centres around the 
explicit need to state the full set of prior information in 
order to do any calculation. The framework provides no 
guidance as to how to do this; instead physical insight 
is needed to select suitable models for comparison with 
data, and to assess their initial probability and the priors 
on the model parameters in advance of that comparison. 
Within a model, prior parameter ranges can be thought 
of as plausible regions of parameter space that are acces- 
sible to the model. 

Bayes theorem can be thought of as a decomposition of 
the final result into prior knowledge and the likelihoods 
measuring the information coming from the data. In- 
deed, the Bayesian formalism forces us to state explicitly 
which part of the result is due to our assumptions, and 
which part is driven by the data. The hope or expecta- 
tion, both at the parameter level and model level, is that 
data will be obtained of sufficient quality to overturn in- 
correct prior hypotheses. If there is a broad range of 
possible models, corresponding to different prior choices, 
it will require more data to converge to a robust conclu- 
sion. But in the Bayesian approach data will eventually 
overcome prior choices; the wider the range of plausible 
priors, the more data we can expect to need before a firm 
conclusion can be drawn. 

The Bayesian evidence sets up a tension between the 
ability of a model to fit the data and the prior predictive- 
ness of the model, in a quantitative implementation of 
Occam's razor. Note that that we prefer to use the term 
'predictiveness' rather than 'simplicity/complexity'; the 
former is what is actually rewarded by the evidence, and 
is not necessarily directly related to, for instance, the 
number of parameters. The models that do best are the 
ones that make specific predictions that later turn out to 
fit the data well. Less predictive models, even if they can 
fit the data as well, score more poorly. The Bayesian ev- 
idence has been widely applied to cosmological problems 
in recent years @, I, @, i, i, [M El El • 

Some statistics have been extensively used as proxies 
to the actual evidence, such as the Bayesian Informa- 
tion Criterion (BIC) [IH, [T3| . But unlike the evidence 
these approximations are often biased, and by construc- 
tion disfavour models with more parameters, even when 
those parameters are not constrained by the data (see 
IIIC for more on this in relation to the evidence, and 
Ref. [lH for a discussion of the limitations of some infor- 
mation criterion based approaches). Wherever possible 
the full evidence should be used. 



III. MISCONCEPTIONS ABOUT MODEL 
SELECTION 

The paper of Linder and Miquel [lj launches a pri- 
marily rhetorical attack on the model selection frame- 
work. We will argue here that the paper contains numer- 
ous factually-incorrect statements. These appear largely 



to be traceable to three fundamental misunderstandings 
concerning the Bayesian framework and its applications, 
which we now describe. 



A. Model selection does not replace parameter 
estimation. It extends it. 

Linder and Miquel appear to believe that model selec- 
tion and parameter estimation are competing techniques. 
This is incorrect. As described above, model selection ex- 
tends the Bayesian framework to the model level. Within 
each model, parameter estimation is carried out in the 
usual manner. This would include, as usual, goodness- 
of-fit and data subset consistency checks. 

Specifically, we see that parameter estimation corre- 
sponds to model selection where the prior model proba- 
bilities of all but one model have been set to zero. This 
seems a regressive step; one can hardly claim that our 
understanding of, for instance, dark energy is so good 
that we should focus on only one possible description. 

From this perspective, the need to choose model pri- 
ors is clearly an advantage, not a drawback. Parameter 
estimation corresponds to one particular choice of those 
priors. By acknowledging that other choices are possi- 
ble, a much more wide-ranging and robust investigation 
of the possible outcomes of future experiments can be 
made, as was done in Ref. [Til ]. 

A further advantage of model selection is that it al- 
lows one to ask new types of question. As it subsumes 
parameter estimation, one can obviously still ask about 
parameter confidence ranges, for instance, either model- 
by- model or via Bayesian model averaging as in Ref. pd| . 
But one can also ask whether entire models are excluded 
by data at a given strength of evidence, based on their 
posterior model probability, or whether data provide sup- 
port for additional model parameters. Indeed, the cur- 
rent leading questions in dark energy studies are of model 
selection type, viz. is the equation of state w equal to 
— 1 or is it variable? In the latter case, is w constant or 
time- varying? One can also compare models that are not 
nested, for instance is quintessence a better description 
of the data than a modified gravity model? Such ques- 
tions are not accessible to parameter fitting analyses and 
often cannot even be phrased in frequentist terms. 

An important application of model selection is survey 
forecasting and design, where one assesses or constructs 
a survey in order to optimize the ability to answer a par- 
ticular question or questions [J, d, [l2j ■ The details will 
inevitably depend to some extent on prior assignments, 
and it is of course important to vary these within reason- 
able ranges. Model selection forecasting allows optimiza- 
tion for a broader range of possible questions. 

Linder and Miquel also claim that survey design based 
on model selection is betting on the absence of structure 
in the possible parameter space, apparently confusing the 
space of possible 'true' models with the likelihood in that 
space given a particular 'true' model. The opposite is 
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true. By including several models, one can focus atten- 
tion on particular regions of parameter space that are 
especially well motivated, for instance ACDM, or the lo- 
cations predicted by one-parameter quintessence models. 
In fact it is parameter estimation that assumes that the 
parameter space is a blank canvas in which each point is 
of equal value. 



B. Physical intuition and priors are the same thing! 

Linder and Miquel criticize the Bayesian methodol- 
ogy for giving results that are often dependent on prior 
assumptions, and simultaneously claim that it seeks to 
avoid, or even prevent, use of physical intuition. Appar- 
ently they have not realised that physical intuition and 
priors are the same thing! After all, where do the mod- 
els come from that we decide to compare to the data? 
What decides their prior model probabilities, and the 
reasonable ranges for their parameters? This is where 
the physics comes in. The mere fact that it can be diffi- 
cult to put our physical intuition in quantitative terms by 
selecting prior ranges and prior probabilities is no good 
reason to give up the exercice. 

The Bayesian model selection framework, by allow- 
ing us to specify multiple models with both model and 
parameter priors, maximizes our chance to incorporate 
physical intuition into data analysis. Linder and Miquel's 
claims to the contrary hold no substance at all. From 
this perspective, the prior dependence in Bayesian anal- 
ysis should be viewed in a positive light, not a negative 
one, as it allows different intuitions to be tested. Bayes' 
theorem provides a convenient decomposition into the 
parts of the conclusions that are data-driven (the pa- 
rameter and model likelihoods) and those that are prior- 
driven (the physical intuition), and so one can always 
keep track of the balance between those two. As long 
as the data cannot decide the issue, our physical intu- 
ition influences the outcome of our conclusions, but in 
the Bayesian framework we are made explicitly aware 
of this situation through the need to specify an explicit 
prior. There is no inference without assumptions. As the 
amount and quality of data increases, the priors become 
less important and the conclusions based on our expec- 
tations are replaced by conclusions based on actual data. 
This is how physics should work. 



Their main mistake is a failure to recognize the differ- 
ence between two distinct circumstances. The first is a 
situation where a phenomenon could have been discov- 
ered, but wasn't; this corresponds to a likelihood function 
well localized within the prior of the relevant parame- 
ter, but consistent with a zero value. Model selection 
statistics act against models with the extra parameter 
in that case (an example being spatial curvature). The 
second is the situation where observations were of insuf- 
ficient power to constrain the parameter, corresponding 
to a flat or nearly flat likelihood across the prior. In this 
case, the contribution of the parameter factorizes out of 
the evidence integral, leaving it unchanged. Therefore 
Bayesian model selection does not act against parameters 
that are unconstrained by existing data (see Ref. [9( for 
a detailed discussion). Comparisons of such models are 
inconclusive, awaiting new data. All the examples they 
give purporting to show model selection going astray are 
actually in the second category, and not in the first as 
they say. 

Their misunderstanding can be partially traced to their 
use of the BIC [13fl . This model selection criterion as- 
sumes that all parameters are well measured. If this is 
not the case, then the BIC will exclude models that are 
perfectly acceptable when using full Bayesian model com- 
parison, as e.g. demonstrated in Ref. 0- There it was 
shown that the BIC rules out the "kink" parametrisa- 
tion of the dark energy equation of state, in disagreement 
with the full Bayesian evidence. Indeed, in the deriva- 
tion of the BIC only the scaling with the number of data 
points N was kept, while even in the idealized case of 
linear models with Gaussian errors the overall scaling of 
the log-evidence for a model with k parameters is rather 
k\n(N/k) [9]. Additionally there is a term that depends 
on the size of the error bars relative to the size of the 
prior, which often dominates. For these reasons, the BIC 
tends to give an unrealistically high penalty to extra pa- 
rameters, compared to the full Bayesian evidence, if its 
underlying assumptions are not met. Only the evidence 
is a full implementation of Bayesian model comparison. 



IV. CONCLUSIONS 



C. Model selection does not act against models 
whose parameters cannot yet be measured. 

Linder and Miquel give a historical overview, titled 're- 
ality check', which seeks to show by example that model 
selection techniques, if applied in the past, would have 
led researchers astray. In our view all of this section is 
incorrect, as we explain in detail in the Appendix. Here 
we will address the reason why Linder and Miquel have 
gone astray. 



We regard model selection techniques as a powerful 
tool for cosmologists, both for data analysis and for sur- 
vey forecasting and design. They broaden the range of 
questions one can ask of present and future observations, 
and can be applied in a consistent and rigorous frame- 
work. While there remains room for debate about the 
relative merits of frcqucntist and Bayesian approaches in 
cosmology, we believe that the many demonstrable flaws 
of the Linder-Miquel paper prevent it from contributing 
constructively to that debate. 
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APPENDIX A: DETAILED CRITIQUE 

In this Appendix we provided a detailed critique of 
some of points made by Linder and Miquel. 



1. Linder and Miquel: Section III/V 

In our view, the discussion in their section III, seeking 
instances from history where model selection would have 
misinformed, is entirely wrong or irrelevant. Since this 
is the sole motivation for their Section V, it too has no 
validity. We take their account paragraph by paragraph. 

a) This paragraph claims that, pre-1998, model se- 
lection would have dismissed the now-favoured ACDM 
model. Absolutely not! Data before that epoch were un- 
able to meaningfully constrain A. As discussed above, 
the comparison would have been inconclusive. This is 
in accord with the fact that in the 1990s papers typi- 
cally considered several cosmological models, including 
ACDM, on a roughly equal footing. In 1998 better data 
came along able to rule out the critical-density and open 
models, at which point model selection would correctly 
pick out the dark energy model. 

b) This paragraph mentions Feynman's repackaging of 
all equations of nature into U = 0. We can see no rel- 
evance in this point. Repackaging equations does not 
change the number of model fit parameters, and hence 
affects neither parameter estimation nor model selection. 

c) This paragraph claims that before 1992 model se- 
lection would have argued against structure in the cos- 
mic microwave background (CMB), on the grounds that 
Ce = is simpler than independently specifying each Ci. 
Absolutely not! This point confuses the data and the 
models. No one has ever thought that a separate spec- 
ification of each Ci was a model, and certainly not in 
1992. Indeed, clearly the acceptable models of the time, 
the CDM family with or without A, all predicted CMB 
structures that were indeed subsequently seen. Not that 
this has anything much to do with model selection; mod- 
els cannot be rejected before obtaining data that actually 
constrains them. 

d) This paragraph notes that the galaxy two-point 
correlation function was for many years thought to be 
a power-law, without an underpinning physical model. 



However, that the power-law model is no longer consid- 
ered is irrelevant. Would any more physical model have 
been wrongly ruled out by model selection, had they ex- 
isted? No is the answer. When data improved, would 
model selection support them over the power-law model? 
Yes. As it should. 

e) This paragraph makes a point about there being the 
deeper physics of the halo model behind the matter power 
spectrum, but this has nothing to do with cosmological 
model selection. 

f ) This paragraph claims that the modern electro- weak 
theory would have been rejected by model selection had it 
existed contemporaneously with the Fermi theory during 
its heyday. Absolutely not! Had the Glashow-Weinberg- 
Salam model been around in say 1950, it would not have 
been ruled out by model selection because all of its pa- 
rameters were poorly constrained. Model selection would 
have been unable to distinguish it from the simpler Fermi 
model. Later on better data came along and ruled out 
the simpler model. Just as it should. 

g) This paragraph makes a point that seemingly com- 
plex phenomenology may have a simple underlying struc- 
ture, e.g. atomic spectra. There is some relevance to this 
point, though a 'complicated' say two-parameter equa- 
tion of state model for dark energy is unlikely to have a 
substantially simpler 'fundamental' description. Never- 
theless, if improved physical understanding comes along 
and creates a compelling model of that type, then that is 
the time to try out model selection statistics on it. Such 
a model can hardly be ruled out before it even exists, nor 
tested until its predictions are defined. 



2. Linder and Miquel: Section IV 

In Section IV the authors advocate a frequentist 'rejec- 
tion of null hypothesis' test where ACDM is the null hy- 
pothesis. This is done by simulating data only for ACDM 
and drawing likelihood contours, with the viability of 
ACDM then interpreted according to the position of ac- 
tual measurements with respect to those contours. Note 
that this approach seeks to rule out ACDM in favour of a 
more general dark energy model without ever computing 
the probability of the data under the latter model. 

We first note that the quantity they compute and call 
BIC is not the BIG A giveaway is that they are claiming 
that the lower likelihood models are preferred. The cor- 
rect computation of the BIC requires simulation of data 
at each point in the parameter space, and then a model 
comparison test of ACDM versus the two-parameter dark 
energy model at each point. We carried out exactly such 
an analysis, computing the full evidence rather than the 
BIC, in Ref. Q . Linder and Miquel only simulate ACDM, 
and then simply flip the sign of the relative log-likelihood, 
which is not equivalent. 

On top of the above flaw, Linder and Miquel's argu- 
ment then goes on to describe a situation in which a 
frequentist analysis delivers a 90% confidence contour 
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around the ACDM model (based on synthetic data) in 
the (wo,w a ) plane, and claims that a measurement ly- 
ing outside that region would exclude the ACDM at the 
90% confidence level. This is an incorrect statement in 
frequentist statistics, as it slips in the wrong assump- 
tion that the probability of the data given the hypothe- 
sis (i.e., the frequentist confidence region) is the same as 
the probability of the hypothesis given the data. 1 The 
latter quantity is undefined for a frequentist, for whom a 
hypothesis is either true or false (although we might not 
know which one is true) and a probabilistic statement 
about it would be meaningless. For a Bayesian of course 
the two are related through Bayes theorem. 

Linder and Miquel also voice their discontent about 
the BIC condition being stronger, i.e. making it harder to 
rule out ACDM. But this is hardly surprising, bei ng j ust 
a manifestation of Lindley's well-known 'paradox' [15|; as 
summarized in Appendix A of Ref. [H (see Figure Al), 
in general frequentist significance tests do not agree with 
Bayesian model selection, since the former ignore the in- 
formation gained through the data. This is evident in 
Figures 1 and 2 of Ref. [8[ , which is exactly the compar- 
ison Linder and Miquel are trying to make. There is no 
basis to claim, using a frequentist significance test, that 
the BIC 'spuriously rules out' a particular set of models, 
because there is no basis to take the frequentist result as 
the 'truth'. We could equally well say that model-level 
Bayesian inference demonstrates that parameter estima- 
tion 'spuriously rules out ACDM' in those circumstances. 

It is true that model selection gives a larger param- 
eter area in which ACDM would not be ruled out even 
if it is wrong, though usually returning an inconclusive 
verdict in that case, to be deferred to future data. The 
trade-off is that parameter estimation techniques applied 
for model comparison are much more likely to rule out 
ACDM even if it is right (by not recognizing that the data 
has lower probability under a less predictive model). One 
cannot win on both sides of that coin. 

Independently of the above misconceptions, Linder and 
Miquel further claim that if ACDM results in a poor like- 
lihood in light of new data, then it should be rejected in 
favour of a more general (less predictive) model, i.e. one 
in which wq and w a vary freely over any range. It is 
not clear for example why this model was chosen instead 
of, for example, one where w(z) varies in 1000 rcdshift 
bins, which would probably achieve an even better fit. In 
the Bayesian framework we can admit all those models, 
assigning a prior probability to each that reflects our rel- 
ative degree of belief based on our understanding of the 



physical processes at work. One then goes on to compute 
the posterior probabilities for each of the models. 

3. Linder and Miquel: Section II 

We disagree with all statements in Section II of their 
paper implying that parameter estimation and model se- 
lection are distinct endeavours. In addition we note 

1) The statement that we wouldn't want to throw away 
a tree containing one fit fruit is misleading. If the fit 
fruit is a better fit that those on other trees, then the 
goodncss-of-fit will be rewarded by model selection. If it 
is no fitter than those on smaller trees (and is everywhere 
constrained meaningfully by data) then of course we do 
want to throw away that tree: this is what Occam's Ra- 
zor is all about and without it we have no control over 
arbitrarily complex models. 

2) It is implied that model selection might disadvan- 
tage fundamental models that might have apparently 
complicated phenomenological manifestations. Specific 
examples mentioned are braneworld models of modified 
gravity and inverse power-law potentials. This criticism 
is not true at all; people are welcome, indeed encour- 
aged, to deploy fundamental parameters in model selec- 
tion rather than phenomenological ones where possible. 

4. Rhetoric 

We end by pointing out that this paper uses the rhetor- 
ical trick of attributing, without citation, and then re- 
butting, some vaguely ridiculous assertions supposedly 
held by model selection advocates. For instance, no-one 
has suggested that model selection techniques should be 
'blindly applied' without regard to physical insight, and 
if they had it would have been a pretty ludicrous sugges- 
tion. No one has claimed that parameter fitting is 'mis- 
guided', it being a key part of the inference procedure, 
though we have indeed argued that it is inadequate if one 
wishes to answer questions phrased at the model level 
(e.g. is quintessence a better description of data than 
a particular modified gravity model). We are also un- 
aware of any cases where 'overenthusiastic application of 
model selection led to some claims about the probability 
of future experiments failing to see characteristics such 
as dynamics that current data cannot access', though we 
may have been enthusiastic about being able to make 
probabilistic forecasts under carefully-defined prior as- 
sumptions 0, [nj E, E2 . 



To convince oneself of the difference between the two quantities, 
imagine selecting a person at random — the person can either 
be male or female (our hypothesis). If the person is female, 
her probability of being pregnant (our data) is about 3%, i.e. 
P (pregnant | female) = 0.03. However, if the person is pregnant, 
her probability of being female is much larger than that, i.e. 
P(female|pregnant) 3> 0.03. For further details, see Ref. Hal . 
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